模拟在有效评估自动驾驶汽车方面发挥了重要作用。现有方法主要依赖于基于启发式的模拟,在该模拟中,交通参与者遵循某些无法产生复杂人类行为的人类编码的规则。因此,提出了反应性仿真概念,以通过利用现实世界数据来弥合模拟和现实世界交通情况之间的人类行为差距。但是,这些反应性模型可以在模拟几个步骤后轻松地产生不合理的行为,我们将模型视为失去其稳定性。据我们所知,没有任何工作明确讨论并分析了反应性仿真框架的稳定性。在本文中,我们旨在对反应性模拟进行彻底的稳定性分析,并提出一种增强稳定性的解决方案。具体而言,我们首先提出了一个新的反应模拟框架,在其中我们发现模拟状态序列的平滑度和一致性是稳定性的关键因素。然后,我们将运动学媒介物模型纳入框架中,以提高反应性模拟的闭环稳定性。此外,在本文中提出了一些新颖的指标,以更好地分析模拟性能。
translated by 谷歌翻译
轨迹预测是自动驾驶汽车的重要任务之一。机器学习的最新进展使一系列高级轨迹预测算法。最近,许多研究人员证明了使用图形神经网络(GNN)进行轨迹预测的矢量化表示的有效性。但是,这些算法要么很少关注模型在各种情况下的推广性,要么只是假设培训和测试数据遵循类似的统计数据。实际上,当测试场景是看不见的或分布不足(OOD)时,由此产生的火车测试域转移通常会导致预测性能的显着降解,这将影响下游模块并最终导致严重的事故。因此,重要的是要彻底研究预测模型的概括性,这不仅可以帮助识别其弱点,而且还提供了有关如何改善这些模型的见解。本文提出了使用功能归因方法来帮助解释黑框模型的概括分析框架。对于案例研究,我们对利用矢量化表示的基于图形的最先进的轨迹预测指标提供了深入的概括分析。结果表明,由于域的转移而导致的性能降低,功能归因提供了见解,以识别这些问题的潜在原因。最后,我们得出结论的共同预测挑战以及训练过程引起的加权偏见如何恶化准确性。
translated by 谷歌翻译
人类司机的高保真行为预测对于自治车辆的高效和安全部署是至关重要的,这是由于人为行为的随机性,异质性和时变性而挑战。一方面,训练有素的预测模型只能在平均意义上捕获运动模式,而个体之间的细微差异几乎不能被反映。另一方面,在训练集上训练的预测模型可能不会概括到可能处于不同场景或数据分布的测试集,从而导致可转换性和概括性。在本文中,我们应用了$ \ tau $ -step修改的扩展卡尔曼滤波器参数适应算法(MEKF $ _lambda $)到驾驶行为预测任务,尚未在文学中进行过研究。通过观察到的轨迹的反馈,该算法应用于基于神经网络的模型,以改善不同人类对象和场景的驾驶行为预测的性能。提出了一组新的指标,以系统评估在线适应性能降低不同个人和场景的预测误差。还提供了对模型中最佳层的实证研究及其观察步骤。
translated by 谷歌翻译
准确预测交通参与者的可能行为是自治车辆的基本能力。由于自主车辆需要在动态变化的环境中导航,因此它们预计无论它们在哪里以及它们遇到的驾驶环境如何,它们都会准确。因此,当在现实世界中部署自动车辆时,对看不见域的概念能力对于预测模型至关重要。在本文中,我们旨在解决车辆意图预测任务的域泛化问题,提出了基于因果序列域泛化(CTSDG)模型。我们构建用于车辆意图预测任务的结构因果模型,以学习域泛型输入驱动数据的不变表示。我们进一步将反复潜变量模型进一步集成到我们的结构因果模型中,以更好地捕获时间序列输入数据的时间潜在依赖关系。我们的方法的有效性通过现实世界的驾驶数据进行评估。我们证明,与其他最新的域泛化和行为预测方法相比,我们所提出的方法对预测精度一致地改善。
translated by 谷歌翻译
当自治车辆仍然努力解决在路上驾驶期间解决具有挑战性的情况时,人类长期以来一直掌握具有高效可转移和适应性的驱动能力的推动的本质。通过在驾驶期间模仿人的认知模型和语义理解,我们呈现帽子,一个分层框架,在多助手密集交通环境中产生高质量的驾驶行为。我们的方法层次地由高级意图识别和低级动作生成策略组成。通过语义子任务定义和通用状态表示,分层框架可在不同的驱动方案上传输。此外,我们的模型还能够通过在线适应模块捕获个人和场景之间的驾驶行为的变化。我们展示了在交叉路口和环形交叉路口的真实交通数据的轨迹预测任务中的算法,我们对该提出的方法进行了广泛的研究,并证明了我们的方法在预测准确性和可转移性方面的方式表现出其他方法。
translated by 谷歌翻译
职业性肺炎(OP)分期是有关受试者肺部健康的重要任务。患者的分期结果取决于分期标准和他的胸部X射线。它本质上是图像分类任务。但是,OP数据的分布通常是不平衡的,这在很大程度上降低了分类模型的效果,这些模型是在数据遵循平衡分布并导致分阶段不准确的假设下提出的。为了实现准确的操作分期,我们提出了一个能够在这项工作中处理不平衡数据的OP登台模型。所提出的模型采用灰度同时出现矩阵(GLCM)来提取胸部X射线的纹理特征,并使用加权宽学习系统(WBLS)实现分类。对医院提供的六个数据案例的实证研究表明,提出的模型可以比具有不平衡数据的最先进的分类器执行更好的OP分期。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译